The State of Global Terrorism

An In-Depth Analysis of Trends and Threats

Author

Shreehar Joshi

Terrorism has been a constant hindrance on mankind’s effort to achieve global peace and prosperity. From hostage situations and hijackings to mass shootings and bombings, terrorist attacks have a profound impact on both the victims and the larger society; they cause physical harm and loss of life, as well as emotional trauma and psychological distress. Needless to say, they can have long-lasting socio-economic consequences, disrupting trade and commerce, causing job losses, and decreasing investor confidence.

As the frequency of terrorist attacks is increasing at a rate faster than ever, it is crucial to understand them and their trends and patterns. In this blog post, I will be examining various aspects of terrorism including regions, targets, methods, and motives using three open-source datasets: Global Terrorism Database (GTD), which contains information on over 180,000 global terrorist attacks from 1970 to 2017; World, Region, Country GDP/GDP per capita, which includes the GDP per Capita of different countries from 1960 to 2021; and the World Bank National Accounts data, which provides the information on the fertility rate and net migration of each country from 1955 to 2020.

I hope this project will shed some light on the phenomenon of global terrorism and will equip us better to combat them in the future. So let’s roll up our sleeves and demystify the data from the world of global terrorism.

Code
import pandas as pd
import numpy as np
import plotly.express as px
import nltk
from sklearn.metrics import mean_squared_error
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn import neighbors
import tensorflow as tf
from PIL import Image
from tensorflow.keras.models import Sequential
from sklearn.model_selection import train_test_split
from tensorflow.keras.layers import Dense, Dropout, Conv1D, MaxPooling1D, Flatten, LSTM, SimpleRNN
from tensorflow.keras.layers import Bidirectional, GRU, UpSampling1D
import plotly.express as px
from sklearn.preprocessing import LabelEncoder
from wordcloud import WordCloud
import matplotlib.pyplot as plt
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.lines import Line2D
import matplotlib.patches as mpatches
import time
import warnings
import bar_chart_race as bcr
warnings.filterwarnings("ignore", category=FutureWarning)

df_attacks = pd.read_csv("../data/globalterrorismdb_0718dist.csv", encoding="ISO-8859-1", low_memory=False)
df_attacks.head()
df_attacks = df_attacks[['eventid','iyear', 'imonth', 'iday', 'country_txt', 'region_txt', 
'provstate', 'city', 'latitude', 'longitude', 'suicide', 'attacktype1_txt', 'targtype1_txt', 
'gname', 'motive', 'weaptype1_txt', 'nkill']]
df_attacks.rename(columns={"eventid": "Event ID", "iyear": "Year", "imonth": "Month", 
"country_txt": "Country", "region_txt": "Region", "provstate": "Province/State", "city": "City", "latitude": "Latitude", 
"longitude": "Longitude", "suicide": "Suicide", "attacktype1_txt": "Attack Type",
"targtype1_txt": "Target Type", "gname": "Terrorist Group", "motive": "Motive", 
"weaptype1_txt": "Weapon Type", "nkill": "Casualties"}, inplace=True)

df_population = pd.read_csv("../data/population.csv")
df_population = df_population[["Country","Year", "Migrants(net)", "FertilityRate"]]
df_population.rename(columns= {"FertilityRate": "Fertility Rate", "Migrants(net)": "Migrants (net)"}, inplace=True)

df_gdp = pd.read_csv("../data/world_country_gdp_usd.csv")
df_gdp = df_gdp[['Country Name','year', 'GDP_USD']]
df_gdp.rename(columns= {"Country Name": "Country", "year": "Year", "GDP_USD":"GDP (in USD)", "GDP_per_capita_USD": "GDP (per capita)"}, inplace=True)

df_us_population = pd.read_csv("../data/us_population.csv")
df_us_population = df_us_population[["state", "pop2022"]]
df_us_population.rename(columns= {"state": "State", "pop2022": "Population"}, inplace=True) 

fig = px.scatter_geo(df_attacks, lon="Longitude", lat="Latitude", animation_frame="Year", color="Region",
                     projection="equirectangular", animation_group="Year", title="Terrorist Attacks (1970 - 2017)")
fig.update_layout(title_x=0.44)
fig.show()

Figure 1: Global Terrorist Attacks

The animation in Figure 1 shows that there were a significant number of terrorist attacks in the US from 1970 to 2017. It is surprising to see this, especially when we consider the effort the US has put over the past 50 years in tackling terrorism in almost every terrorist-prone country. So before anything else, let’s analyze the states in the US that have the highest number of such incidents.

Code
us_states = np.asarray(['AL', 'AK', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', 'GA',
                        'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA',
                        'MI', 'MN', 'MS', 'MO', 'MT', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY',
                        'NC', 'ND', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX',
                        'UT', 'VT', 'VA', 'WA', 'WV', 'WI', 'WY'])
us_state_to_abbrev = {
    "Alabama": "AL",
    "Alaska": "AK",
    "Arizona": "AZ",
    "Arkansas": "AR",
    "California": "CA",
    "Colorado": "CO",
    "Connecticut": "CT",
    "Delaware": "DE",
    "Florida": "FL",
    "Georgia": "GA",
    "Hawaii": "HI",
    "Idaho": "ID",
    "Illinois": "IL",
    "Indiana": "IN",
    "Iowa": "IA",
    "Kansas": "KS",
    "Kentucky": "KY",
    "Louisiana": "LA",
    "Maine": "ME",
    "Maryland": "MD",
    "Massachusetts": "MA",
    "Michigan": "MI",
    "Minnesota": "MN",
    "Mississippi": "MS",
    "Missouri": "MO",
    "Montana": "MT",
    "Nebraska": "NE",
    "Nevada": "NV",
    "New Hampshire": "NH",
    "New Jersey": "NJ",
    "New Mexico": "NM",
    "New York": "NY",
    "North Carolina": "NC",
    "North Dakota": "ND",
    "Ohio": "OH",
    "Oklahoma": "OK",
    "Oregon": "OR",
    "Pennsylvania": "PA",
    "Rhode Island": "RI",
    "South Carolina": "SC",
    "South Dakota": "SD",
    "Tennessee": "TN",
    "Texas": "TX",
    "Utah": "UT",
    "Vermont": "VT",
    "Virginia": "VA",
    "Washington": "WA",
    "West Virginia": "WV",
    "Wisconsin": "WI",
    "Wyoming": "WY",
    "District of Columbia": "DC",
    "American Samoa": "AS",
    "Guam": "GU",
    "Northern Mariana Islands": "MP",
    "Puerto Rico": "PR",
    "United States Minor Outlying Islands": "UM",
    "U.S. Virgin Islands": "VI",
}
df_attacks_us = df_attacks[df_attacks["Country"] == "United States"] 
df_attacks_us = pd.DataFrame(df_attacks_us.groupby("Province/State")["Event ID"].count())
df_attacks_us = df_attacks_us.reset_index()
df_attacks_us.rename(columns={"Province/State": "State", "Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_attacks_us = df_attacks_us[df_attacks_us["State"] != "Unknown"]
df_attacks_us["State Code"] = df_attacks_us["State"].apply(lambda x: us_state_to_abbrev[x])
def scale_column(df, column, minVal=float('-inf'), maxVal=float('inf')):
    if minVal == float('-inf'):
        minVal = min(df[column])
    if maxVal == float('inf'):
        maxVal = max(df[column])
    res = []
    for val in df[column]:
        res.append((val - minVal) / (maxVal - minVal))
    return res

df_us_population.head()
df_attacks_us = df_attacks_us.merge(df_us_population[['State', 'Population']])
df_attacks_us["Number of Terrorist Attacks (Standardised)"] = df_attacks_us["Number of Terrorist Attacks"] / df_attacks_us["Population"]
tempVal = scale_column(df_attacks_us, "Number of Terrorist Attacks (Standardised)")
df_attacks_us["Number of Terrorist Attacks (Standardised)"] = tempVal
df_attacks_us = df_attacks_us.sort_values(by="Number of Terrorist Attacks (Standardised)", ascending=False)
fig = px.choropleth(df_attacks_us, locations='State Code', color='Number of Terrorist Attacks (Standardised)',
                    color_continuous_scale="Viridis",
                    locationmode="USA-states", 
                    scope="usa",
                    labels={'Number of Terrorist Attacks (Standardised)':'No. of Terrorist Attacks'},
                    title="Terrorist Attacks in the US (1970-2017)")
fig.update_layout(title_x=0.44)
fig.update_layout( legend = {"xanchor": "right", "x": -0, "y":1.9})
fig.update_layout(height=500, width=780)
fig.show()

Figure 2: Terrorist Attacks in the US

Figure 2 shows different states in the US with a varying number of terrorist attacks, which has been calculated by dividing the total number of terrorist attacks in a given state by its population and standardizing in such a way the state with the highest score is assigned a value of 1 and the least score is assigned a value of 0. We see that New York, Oregon, California, Washington, and Nebraska are the five most terrorist-prone states in the US and Kentucky, South Carolina, West Virginia, Alaska, and Arkansas are the safest states in terms of the frequency of terrorist attacks.

It might be interesting to see the motives behind the terrorist attacks in the US. So, let’s explore them, and to be more specific, let’s compare the motives of the terrorist attacks from 1970-1999 and 2000-2017.

Code
stpwrd = nltk.corpus.stopwords.words('english')
extended_list = ["specific",  "motive", "unknown", "Unknown", "incident", "claimed", "responsibility", "however", "unaffiliated", "individual", "identified", "killed", "stated", "anti", "attacks", "protest", "carried", "attack", "trend", "larger", "may", "part", "following", "community", "sources", "violence", "targeting", "noted", "posited", "suspected", "targeting", "members", "noted", "targeted", "also", "assailant", "perpetrator", "meant", "bring attention", "practice", "perpetrator", "assailant", "meant", "bring", "attention"]

stpwrd.extend(extended_list)


df_attacks_us = df_attacks[df_attacks["Country"] == "United States"]
df_attacks_us = df_attacks_us[["Year", "Motive"]]
df_attacks_us = df_attacks_us.dropna()

temp_df = df_attacks_us[(df_attacks_us["Year"] >= 1970) & (df_attacks_us["Year"] < (2000))]
motive = list(temp_df["Motive"].values)
motive = " ".join(motive)

wordcloud = WordCloud(width=1000, height=800,
                background_color ='white',
                stopwords=stpwrd,
                color_func=lambda *args, **kwargs: "green",
                min_font_size = 10).generate(motive)
plt.figure(figsize = (12, 12), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off")
plt.tight_layout(pad = 2)
plt.title("Attack Motives (" + str(1970) + " - " + str(1999) + ")", fontdict={'fontsize': 36})
plt.show()

stpwrd = nltk.corpus.stopwords.words('english')
stpwrd.extend(extended_list)


df_attacks_us = df_attacks[df_attacks["Country"] == "United States"]
df_attacks_us = df_attacks_us[["Year", "Motive"]]
df_attacks_us = df_attacks_us.dropna()

temp_df = df_attacks_us[(df_attacks_us["Year"] >= 2000) & (df_attacks_us["Year"] <= (2017))]
motive = list(temp_df["Motive"].values)
motive = " ".join(motive)

wordcloud = WordCloud(width=1000, height=800,
                background_color ='white', 
                stopwords=stpwrd,
                color_func=lambda *args, **kwargs: "purple",
                min_font_size = 10).generate(motive)
plt.figure(figsize = (12, 12), facecolor = None) 
plt.imshow(wordcloud) 
plt.axis("off")
plt.tight_layout(pad = 2)
plt.title("Attack Motives (" + str(2000) + " - " + str(2017) + ")", fontdict={'fontsize': 36})
plt.show()

(a) 1970-1999

(b) 2000-2017

Figure 3: Attack Motives in the US

Both the word clouds share a common theme of abortion, suggesting that this has been a prominent topic of discussion and conflict for several decades. However, the word-clouds also differ in significant ways. The first word-cloud, which pertains to the pre-2000 period, reveals issues that were relevant to Puerto Rico, Vietnam, and African American groups. The second word-cloud, which represents the post-2000 period, shows themes that are related to Iraq, ISIL, and Islamic states. These topics suggest that there has been an increase in religiously motivated attacks over the past 20 years. This shift in topics also reflects changes in the political landscape both domestically and internationally from fighting against the spread of communism and racism to religiously motivated terrorism.

Now, let’s analyze how the frequency of terrorist attacks has changed over the last 50 years.

Code
yearly_freq = pd.DataFrame(df_attacks.groupby("Year")["Event ID"].count()).reset_index()
yearly_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
fig = px.bar(yearly_freq, x=yearly_freq["Year"], y=yearly_freq["Number of Terrorist Attacks"], title="Frequency of Terrorist Attacks (1970-2017)")
fig.update_layout(title_x=0.5)
fig.update_layout(height=400)
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.show()

Figure 4: Frequency of Terrorist Attacks

It is apparent from Figure 4 that the number of terrorist attacks was at its minimum around the years 1972 and 2003 (it is worth mentioning that the data for 1994 was missing and not 0) and has greatly increased over the last decade.

But, what parts of the world have experienced the highest number of terrorist attacks?

Code
region_freq = pd.DataFrame(df_attacks.groupby(["Region", "Attack Type"])["Event ID"].count()).reset_index()
region_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
region_freq = region_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)
region_freq['Attack Type'] = region_freq['Attack Type'].replace(['Bombing/Explosion', 'Hostage Taking (Kidnapping)', 'Facility/Infrastructure Attack', 'Hostage Taking (Barricade Incident)'], ['Bombing', 'Hostage', 'Facility Attack', 'Hostage (Barr.)'])
fig = px.bar(region_freq, x=region_freq["Region"], y=region_freq["Number of Terrorist Attacks"], color="Attack Type", height=400, title="Terrorist Attacks in Different Regions", barmode="relative")
fig.update_layout(title_x=0.5)
fig.update_layout(height=500)
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.show()

Figure 5: Terrorist Attacks in different Regions

Figure 5 shows that the Middle East & North Africa, South Asia, and South America were the three most terrorist-prone regions. On the other hand, Australasia & Oceania, Central Asia, and East Asia were the safest regions in terms of terrorism. It is also worth noting that in all the geographical regions, the terrorist groups used bombing and armed assault as the most common form of attacks.

Let’s delve deeper to see which countries from these terrorist-prone regions were contributing the highest number of terrorist incidents.

Code
df_countries_casualties = pd.DataFrame(df_attacks.groupby(["Country"])["Casualties"].sum().reset_index())
df_countries_terrorist_count = pd.DataFrame(df_attacks.groupby(["Country"])["Event ID"].count().reset_index())
df_countries_terrorist_count.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_merged_casualties_count = df_countries_casualties.merge(df_countries_terrorist_count[["Country", "Number of Terrorist Attacks"]])
df_iso_codes = px.data.gapminder()[["country", "iso_alpha"]]
df_iso_codes.rename(columns={"country": "Country", "iso_alpha": "Country Code"}, inplace=True)
df_iso_codes.drop_duplicates(inplace=True)
df_iso_codes = df_iso_codes.reset_index()
df_iso_codes.drop(["index"], axis=1, inplace=True)
df_countries_terrorist_count = df_countries_terrorist_count.merge(df_iso_codes[['Country', 'Country Code']])
fig = px.choropleth(df_countries_terrorist_count, locations="Country Code",
                    color="Number of Terrorist Attacks",
                    hover_name="Country",
                    color_continuous_scale=px.colors.sequential.Plasma,
                    title="Terrorist Attacks (1970 - 2017)")
fig.update_layout(title_x=0.44)
fig.update_layout(height=500, width=880)
fig.show()

Figure 6: Countries with the Highest Number of Attacks

In Figure 6, we see that Iraq in the Middle East; Afghanistan, Pakistan, and India in South Asia, and Colombia in South America were the most terrorist-prone countries.

The analysis of global terrorism is incomplete without information on terrorist groups. So, let’s see the top 15 most notorious terrorist groups based on the number of casualties from the attacks they have orchestrated.

Code
groupwise_casualty_freq = pd.DataFrame(df_attacks.groupby("Terrorist Group")["Casualties"].sum()).reset_index()
groupwise_casualty_freq = groupwise_casualty_freq.sort_values(by="Casualties", ascending=False)[:16]
notorious_groups = list(groupwise_casualty_freq["Terrorist Group"])
notorious_groups.remove("Unknown")
df_notorious_groups = df_attacks[df_attacks["Terrorist Group"].isin(notorious_groups)]
df_notorious_groups = pd.DataFrame(df_notorious_groups.groupby(["Terrorist Group", "Year"])["Casualties"].sum().reset_index())
df_notorious_groups["Terrorist Group"] = df_notorious_groups["Terrorist Group"].replace(["Farabundo Marti National Liberation Front (FMLN)", "Islamic State of Iraq and the Levant (ISIL)", "Kurdistan Workers' Party (PKK)", "Liberation Tigers of Tamil Eelam (LTTE)", "New People's Army (NPA)", "Nicaraguan Democratic Force (FDN)", "Revolutionary Armed Forces of Colombia (FARC)", "Shining Path (SL)", "Tehrik-i-Taliban Pakistan (TTP)"], ["Farbundo Liberation", "ISIL", "Kurdistan W.", "Tamil Tigers", "New People's Army", "Nicaraguan Force", "Colombian Force", "Shining Path", "Taliban Pakistan"])
fig = px.line(df_notorious_groups, x="Year", y="Casualties", color="Terrorist Group", title='Attacks by different Terrorist Groups')
fig.update_layout(title_x=0.5)
fig.update_layout(height=500)
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.show()

Figure 7: Attacks by Different Terrorist Groups

One cannot fail to notice the peak in 2001 for Al Qaida, which is widely taken as the beginning of the rise of other Islamic religious extremist terrorist groups like Taliban, Al-Shabaab, and Boko Haram. Taliban, Boko Haram, and ISIL, as evident from the steep lines after 2010 in Figure 7, appear to have killed more people than all the other 12 terrorist groups combined in the last 50 years.

So what exactly do these terrorist groups target? Let’s find out.

Code
TOP_N = 11
target_freq = pd.DataFrame(df_attacks.groupby("Target Type")["Event ID"].count()).reset_index()
target_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
rem_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[TOP_N:]
target_freq = target_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:TOP_N]
target_freq = target_freq[target_freq['Target Type'] != "Unknown"]
fig = px.bar(target_freq, x='Target Type', y='Number of Terrorist Attacks', title="Common Targets of Terrorist Attacks")
fig.update_layout(title_x=0.5)
fig.update_layout(height=500)
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.show()

Figure 8: Common Targets of Terrorist attacks

Most of the attacks have been targeted toward private citizens & property, the military, and the police. Private citizens & properties are generally the easiest groups to be attacked. This might be one possible explanation for such a high number of attacks on them.

Now, let’s analyze the relationship between terrorism and socio-economic factors like GDP and fertility rate.

Code
def map_region(country):
    region = list(df_attacks[df_attacks["Country"] == country]["Region"])[0]
    return region

country_freq = pd.DataFrame(df_attacks.groupby("Country")["Event ID"].count()).reset_index()
country_freq.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
country_freq = country_freq.sort_values(by="Number of Terrorist Attacks", ascending=False)[:10]
country_freq["Region"] = country_freq["Country"].apply(map_region)
top_five_countries = list(country_freq["Country"].values)[:5]
country_freq_year = pd.DataFrame(df_attacks.groupby(["Year", "Country"])["Event ID"].count().reset_index())
country_freq_year = country_freq_year[country_freq_year["Country"].isin(top_five_countries)]
country_freq_year.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)

df_terrorist_gdp = df_gdp[(df_gdp["Country"].isin(top_five_countries)) & ((df_gdp["Year"] >= 1970) & (df_gdp["Year"] <= 2017))]
df_all_gdp = df_gdp[((df_gdp["Year"] >= 1970) & (df_gdp["Year"] <= 2017))]
df_all_gdp = df_all_gdp.dropna()
df_all_gdp = pd.DataFrame(df_all_gdp.groupby("Year").mean().reset_index())
df_all_gdp.rename(columns={"GDP (in USD)": "World"}, inplace=True)
colorList = list(px.colors.qualitative.T10)
if colorList[0] != "black":
    colorList.insert(0, "black")
for country in top_five_countries:
    temp_gdp = df_terrorist_gdp[df_terrorist_gdp["Country"] == country]
    df_all_gdp[country] = list(temp_gdp["GDP (in USD)"])
fig = px.line(df_all_gdp, x='Year', y=df_all_gdp.columns[1:], title="GDP of Terrorist-prone Countries", color_discrete_sequence=colorList, labels={
                     "value": "GDP (in USD)",
                     "variable": ""
                 })
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.update_layout(title_x=0.5)
fig.update_layout(height=400, width=800)
fig.show()


df_all_fertility = df_population[(df_population["Year"] >= 1970) & (df_population["Year"] <= 2017)]
df_terrorist_fertility = df_population[(df_population["Country"].isin(top_five_countries)) & ((df_population["Year"] >= 1970) & (df_population["Year"] <= 2017))]
df_all_fertility = df_all_fertility.dropna()
df_all_fertility = df_all_fertility.drop(['Migrants (net)'], axis=1)
df_all_fertility = pd.DataFrame(df_all_fertility.groupby("Year").mean().reset_index())
df_all_fertility.rename(columns={"Fertility Rate": "World"}, inplace=True)
for country in top_five_countries:
    temp_fertility = df_terrorist_fertility[df_terrorist_fertility["Country"] == country]
    df_all_fertility[country] = list(temp_fertility["Fertility Rate"])
fig = px.line(df_all_fertility, x='Year', y=df_all_fertility.columns[1:], title="Fertility Rate of Terrorist-prone Countries", color_discrete_sequence=colorList, labels={
                     "value": "Fertility Rate",
                     "variable": ""
                 })
fig.update_layout({
    'plot_bgcolor': 'rgba(0,0,0,0)',
    'paper_bgcolor': 'rgba(0,0,0,0)'
})
fig.update_layout(title_x=0.5)
fig.update_layout(height=400, width=800)
fig.show()

(a) GDP

(b) Fertility Rate

Figure 9: Socio-economic Aspects of Terrorist-prone Countries

Figure 9 shows the GDP and fertility rate of the aforementioned five-most terrorist-prone countries. All these countries had a lower GDP and higher fertility rate compared to the global average in the given period. India is an exception to have its GDP increase at a faster rate than the global average. Similarly, Colombia is an exception to have its fertility rate below the global average right from the 1980s.

Code
try:
    del df_attacks["Event ID"]
    del df_attacks["Motive"]
    del df_attacks["Latitude"]
    del df_attacks["Longitude"]
except:
    print("Some of the columns are not present")
df_attacks = df_attacks.dropna()
df_attacks[['Country', 'Region', 'Province/State', 'City', 'Attack Type', 'Target Type', 'Terrorist Group', 'Weapon Type']] = df_attacks[['Country', 'Region', 'Province/State', 'City', 'Attack Type', 'Target Type', 'Terrorist Group', 'Weapon Type']].apply(LabelEncoder().fit_transform)

y = df_attacks["Casualties"]
X = df_attacks.drop(['Casualties'], axis=1)

# Split the data into train (70%), validation (15%), and test (15%) sets
X_trainval, X_test, y_trainval, y_test = train_test_split(X, y, test_size=0.15, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_trainval, y_trainval, test_size=0.20, random_state=42)

scaler = RobustScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
X_val = scaler.fit_transform(X_val)

def create_bilstm():
    model = Sequential()
    model.add(Bidirectional(LSTM(128, activation='relu', input_shape=(12,1), return_sequences=True)))
    model.add(Dropout(0.2))
    model.add(Bidirectional(LSTM(64, activation='relu')))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation='relu'))
    model.add(Dense(1))
    return model

def create_ffnn():
    model = Sequential()
    model.add(Dense(128, activation='relu', input_shape=(12,)))
    model.add(Dropout(0.3))
    model.add(Dense(64, activation='relu'))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation='sigmoid'))
    model.add(Dense(16, activation='tanh'))
    model.add(Dense(1))
    return model

def create_cnn():
    model = Sequential()
    model.add(Conv1D(32, 3, activation='relu', input_shape=(12,1)))
    model.add(MaxPooling1D(2))
    model.add(Conv1D(64, 3, activation='relu'))
    model.add(MaxPooling1D(2))
    model.add(Flatten())
    model.add(Dense(64, activation='relu'))
    model.add(Dense(1))
    return model

def create_gru():
    model = Sequential()
    model.add(GRU(64, activation='tanh', input_shape=(12,1)))
    model.add(Dropout(0.2))
    model.add(Dense(32, activation='tanh'))
    model.add(Dropout(0.2))
    model.add(Dense(1, activation='linear'))
    return model

result = []

dlModels = {"Feed Forward NN": create_ffnn(), "CNN": create_cnn(), "GRU": create_gru(), "Bi-LSTM": create_bilstm()}

X_train_new = X_train.reshape(X_train.shape[0], X_train.shape[1], 1)
X_val_new = X_val.reshape(X_val.shape[0], X_val.shape[1], 1)
X_test_new = X_test.reshape(X_test.shape[0], X_test.shape[1], 1)

for name, model in dlModels.items():
    start_time = time.time()
    model.compile(optimizer='adam', loss='mse')
    if name == "Bi-LSTM":
        model.fit(X_train_new, y_train, epochs=20, batch_size=128, validation_data=(X_val_new, y_val))
        y_pred = model.predict(X_test_new)
    else:
        model.fit(X_train, y_train, epochs=20, batch_size=128, validation_data=(X_val, y_val))
        y_pred = model.predict(X_test)
    result.append([name, round(np.sqrt(mean_squared_error(y_test, y_pred)), 2), round(time.time() - start_time, 2)])


mlModels = {"Random Forest": RandomForestRegressor(), "K Neighbors": neighbors.KNeighborsRegressor(), "Decision Trees": DecisionTreeRegressor()}
for name, model in mlModels.items():
    start_time = time.time()
    model.fit(X_train, y_train)
    pred = model.predict(X_test)
    result.append([name, round(np.sqrt(mean_squared_error(y_test, pred)), 2), round(time.time() - start_time, 2)])

pd.options.display.float_format = '{:.2f}'.format
result_df = pd.DataFrame(result, columns=["Model", "Root Mean Squared Error", "Time (in seconds)"])
result_df.to_csv("./results.csv")  

Finally, let’s take the machine and deep learning algorithms out of our arsenals and tackle the problem of predicting the number of casualties for any given attack based on the date, country, region, state, city, suicidal intent, type, target type, terrorist group, and weapon used in the attack. We believe such a model will be useful for intelligence groups to assess the severity of attacks and prepare for them in the future.

The dataset was split into train, validation, and test sets in the ratio 70:15:15. The train and validation sets were used during the training phase and the test set was for testing the efficiency of the model based on the time it take and its root-mean-squared(RMS) error. The results are shown in Figure 10.

Code
result_df = pd.read_csv("../results/results.csv")
result_df = result_df.sort_values(by=['Root Mean Squared Error'])
matplotlib.rc_file_defaults()
ax1 = sns.set_style(style=None, rc=None)

fig, ax1 = plt.subplots(figsize=(12,6))
colors = ["#5D3FD3", "#5D3FD3", "#5D3FD3","#5D3FD3", "#0096FF", "#0096FF", "#0096FF"]
sns.barplot(data = result_df, x='Model', y='Root Mean Squared Error', alpha=0.5, ax=ax1, palette=colors)
ax1.set_xticklabels(ax1.get_xticklabels(), fontsize=12)
ax1.set_xlabel("Models", fontsize=14)
ax1.set_ylabel("Root Mean Squared Error", fontsize=14)
ax1.set_title("Efficiency of Models", fontsize=16)
ax2 = ax1.twinx()
ax2.set_ylabel("Time (in seconds)", fontsize=14)
dl = mpatches.Patch(color="#5D3FD3")
ml = mpatches.Patch(color="#0096FF")
custom_line = [Line2D([0], [0], color='#0096FF', lw=2), dl, ml]
leg = plt.legend(custom_line, ["Time", "DL Models", "ML Models"], loc="upper left")
for index, lh in enumerate(leg.legendHandles): 
    if index > 0:
        lh.set_alpha(0.5)
sns.lineplot(data = list(result_df["Time (in seconds)"]), marker='o', ax=ax2, color='#0096FF')
plt.show()

Figure 10: Efficiency of Models

Feed Forward Neural Network turned out to be the most effective model, achieving an RMS error of 8.68, and Decision Trees was the fastest model, completing prediction in 0.99 seconds. In general, neural networks had lower RMS error than other machine learning models but they were also slower to train and test than their machine learning counterparts.

Our analysis ends here for now but in the future, the models will be tuned for their hyperparameters and trained on a larger dataset, combining and feature engineering different socio-economic factors to achieve the lowest possible RMSE score.

More Animations

Code
df_countries_pivot = pd.DataFrame(df_attacks.groupby(["Country", "Year"]).count()).reset_index()
df_countries_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_countries_pivot = df_countries_pivot.pivot_table(values = 'Number of Terrorist Attacks',index = ['Year'], columns = 'Country')
df_countries_pivot.fillna(0, inplace=True)
df_countries_pivot.sort_values(list(df_countries_pivot.columns),inplace=True)
df_countries_pivot = df_countries_pivot.sort_index()
df_countries_pivot.iloc[:, 0:-1] = df_countries_pivot.iloc[:, 0:-1].cumsum()
bcr.bar_chart_race(df = df_countries_pivot,
                   n_bars = 10,
                   period_length=1000,
                   sort='desc',
                   title="Countries with the Highest Number of Terrorist Attacks",
                   filter_column_colors=True,
                   filename = None)

df_region_pivot = pd.DataFrame(df_attacks.groupby(["Region", "Year"]).count()).reset_index()
df_region_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_region_pivot = df_region_pivot.pivot_table(values = 'Number of Terrorist Attacks',index = ['Year'], columns = 'Region')
df_region_pivot.fillna(0, inplace=True)
df_region_pivot.sort_values(list(df_region_pivot.columns),inplace=True)
df_region_pivot = df_region_pivot.sort_index()
df_region_pivot.iloc[:, 0:-1] = df_region_pivot.iloc[:, 0:-1].cumsum()
bcr.bar_chart_race(df = df_region_pivot, 
                   n_bars = 12,
                   period_length=1000,
                   sort='desc',
                   title="Terrorist Attacks Based on Geographical Regions",
                   filter_column_colors=True,
                   filename = None)

df_region_pivot = pd.DataFrame(df_attacks.groupby(["Region", "Year"]).count()).reset_index()
df_region_pivot.rename(columns={"Event ID": "Number of Terrorist Attacks"}, inplace=True)
df_region_pivot = df_region_pivot.pivot_table(values = 'Number of Terrorist Attacks',index = ['Year'], columns = 'Region')
df_region_pivot.fillna(0, inplace=True)
df_region_pivot.sort_values(list(df_region_pivot.columns),inplace=True)
df_region_pivot = df_region_pivot.sort_index()
df_region_pivot.iloc[:, 0:-1] = df_region_pivot.iloc[:, 0:-1].cumsum()
bcr.bar_chart_race(df = df_region_pivot, 
                   n_bars = 10,
                   period_length=750,
                   sort='desc',
                   title="Terrorist Attacks based on Geographical Region",
                   filter_column_colors=True,
                   filename = None)

References

Countries in the world by population (2023). Worldometer. Retrieved February 5,
    2023, from https://www.worldometers.info/world-population/population-by-country/

Information on more than 200,000 terrorist attacks. Global Terrorism Database.
   Retrieved February 5, 2023, from https://www.start.umd.edu/gtd/

Lai, N. T. C. (2023, February 3). Word population (1955-2020). Kaggle. Retrieved February
   5, 2023, from https://www.kaggle.com/datasets/nguyenthicamlai/population-2022

Mishinev, T. (2022, September 9). World, region, country GDP/GDP per capita. Kaggle.
   Retrieved February 5, 2023, from
   https://www.kaggle.com/datasets/tmishinev/world-country-gdp-19602021

National Consortium for the Study of Terrorism and Responses to Terrorism. Global
   terrorism database. Kaggle. Retrieved February 5, 2023, from
   https://www.kaggle.com/datasets/START-UMD/gtd

World Bank. GDP (current US$). GDP National Accounts. Retrieved February 5, 2023, from
   https://data.worldbank.org/indicator/NY.GDP.MKTP.CD